Language Model Expansion Using Webdata for Spoken Document Retrieval

نویسندگان

Ryo Masumura

Seongjun Hahm

Akinori Ito

چکیده

In recent years, there has been increasing demand for ad hoc retrieval of spoken documents. We can use existing text retrieval methods by transcribing spoken documents into text data using a Large Vocabulary Continuous Speech Recognizer (LVCSR). However, retrieval performance is severely deteriorated by recognition errors and out-of-vocabulary (OOV) words. To solve these problems, we previously proposed an expansion method that compensates the transcription by using text data downloaded from the Web. In this paper, we introduce two improvements to the existing document expansion framework. First, we use a large-scale sample database of webdata as the source of relevant documents, thus avoiding the bias introduced by choosing keywords in the existing methods. Next, we use a document retrieval method based on a statistical language model (SLM), which is a popular framework in information retrieval, and also propose a new smoothing method considering recognition errors and missing keywords. Retrieval experiments show that the proposed methods yield a good results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-scale document expansion in English-Mandarin cross-language spoken document retrieval

This paper presents the application of document expansion using a side collection to a cross-language spoken document retrieval (CL-SDR) task to improve retrieval performance. Document expansion is applied to a series of EnglishMandarin CL-SDR experiments using selected retrieval models (probabilistic belief network, vector space model, and HMM-based retrieval model). English textual queries ar...

متن کامل

Document Expansion using a Side Collection for Monolingual and Cross-language Spoken Document Retrieval

This paper presents a method of document expansion using a side collection for improving the overall performance in retrieving spoken documents using text queries. This method is applied to Chinese spoken document retrieval (SDR) tasks where a series of experiments have been carried out for both monolingual and cross-language SDR systems. In our monolingual retrieval experiments, Cantonese broa...

متن کامل

The CLEF 2003 Cross-Language Spoken Document Retrieval Track

The current expansion in collections of natural language based digital documents in various media and languages is creating challenging opportunities for automatically accessing the information contained in these documents. This paper describes the CLEF 2003 track investigation of Cross-Language Spoken Document Retrieval (CLSDR) combining information retrieval, cross-language translation and sp...

متن کامل

Effects of Query Expansion for Spoken Document Passage Retrieval

One of the major challenges for spoken document retrieval is how to handle speech recognition errors within the target documents. Query expansion is promising for this challenge. In this paper, we apply relevance models, a type of query expansion method, for the spoken document passage retrieval task. We adapted the original relevance model for passage retrieval. We also extended it to benefit ...

متن کامل

ETH TREC-6: Routing, Chinese, Cross-Language and Spoken Document Retrieval

ETH Zurich's participation in TREC-6 consists of experiments in the main routing task, both manual and automatic runs in the Chinese retrieval track, cross-language retrieval in each of German, French and En-glish as part of the new cross-language retrieval track, and experiments in speech recognition and retrieval under the new spoken document retrieval track. This year our routing experiments...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Language Model Expansion Using Webdata for Spoken Document Retrieval

نویسندگان

چکیده

منابع مشابه

Multi-scale document expansion in English-Mandarin cross-language spoken document retrieval

Document Expansion using a Side Collection for Monolingual and Cross-language Spoken Document Retrieval

The CLEF 2003 Cross-Language Spoken Document Retrieval Track

Effects of Query Expansion for Spoken Document Passage Retrieval

ETH TREC-6: Routing, Chinese, Cross-Language and Spoken Document Retrieval

عنوان ژورنال:

اشتراک گذاری